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ABSTRACT 

A  study  was  conducted  at  Westinghouse  and  the  University  of 
Maryland  to  survey  and  evaluate  FLIR  target  detection/seg¬ 
mentation  algorithms.  The  study  was  conducted  in  software  on 
a  data  base  of  50  FLIR  images  supplied  by  the  three  branches  of 
the  armed  services.  The  study  concluded  that  three  techniques 
(double-window  filters,  spoke  filters,  and  border  followers) 
perform  fairly  well  and  are  implementable  in  real-time 
hardware. 


INTRODUCTION 

This  report  describes  results  of  a  study  conducted  both  at 
Westinghouse  and  the  University  of  Maryland  under  the 
DARPA  Image  Understanding  Program.  One  objective  of  the 
study  is  to  survey  and  evaluate  FL.1R  target  detection/segmen 
tation  algorithms.  Candidate  algorithms  were  chosen  Irom 
those  developed  at  Westinghouse,  th#  University  of  Maryland 
and  other  academic,  governmental,  and  industrial  organiza¬ 
tions.  Algorithms  from  the  initial  list  were  evaluated  over  a 
common  data  base  if  and  only  if:  they  passed  certain  prelimi¬ 
nary  tests,  performed  well  in  previous  studies,  o.  were  claimed 
to  perform  exceptionally  well  by  their  authors. 

Each  algorithm  was  tested  by  its  author's  organization  on  the 
organization's  own  computer  facility.  The  study  had  one  rule: 
"No  algorithmic  parameters  could  be  changed  by  human  inter¬ 
vention  during  the  evidential  run  through  the  data  base.*"  Re¬ 
finements  could  be  made  to  the  software  before  this  final  pass 
through  the  data  base. 

DATA  BASE  COMPILATION 

A  data  base  of  50,  128  x  128  pixel  FLIR  images  was  com¬ 
piled.  Several  images  from  the  data  base  are  shown  in  Figure  1 . 
The  sources  of  the  images  were  four  larger  data  bases  prepared 
by  the  three  branches  of  the  armed  services.  Twenty  of  the  50 
images  were  constructed  from  64  x  64  pixel  target  images 
which  were  repealed  four  times  by  reflecting  them  about  hori¬ 
zontal  and  vertical  axes  (e.g.,  see  Figure  lb).  These  were  used  to 
test  the  algorithms’  sensitivity  to  the  orientation  of  targets. 

The  data  base  was  compiled  with  a  goal  of  diversity.  Often 
results  are  reported  in  the  literature  in  which  an  algorithm  is 
tested  on  a  very  small  collection  of  similar  images,  This  allows 
an  algorithm  to  be  finely  tuned  to  the  ensemble  statistics.  In 


compiling  a  diverse  data  base,  it  was  hoped  to  simulate  realistic 
conditions  in  which  little  a  priori  information  is  available. 

EVALUATION  APPROACH 

The  data  base  was  carefully  hand  segmented  to  obtain  the 
centroid  (Ct,  C,)  and  vertical  ( R ,)  and  horizontal  (Rj)  radii  of 
each  of  the  targets  (Figure  2).  This  hand  segmentation  was  a 
cooperative  effort  of  several  persons  who  had  knowledge  of  the 
data  base  contents.  All  algorithms  were  required  to  use  this 
same  format  for  output.  That  is,  for  each  detected  target,  in 
each  image,  the  output  is  an  estimate:  (C,,  dJt  R,,  Rj) 

A  target  is  said  to  correctly  detected  iff: 

{|C,  -  C,|  <  '/:R,  +  '/rand  |C,  -  Cj|  <  '/iR,  +  '/i},  (1) 

where  all  measures  are  in  image  picture  elements  (pixels).  A 
false  alarm  is  any  detection  outside  this  prescribed  region,  Extra 
detections  inside  the  prescribed  region  are  each  scored  as  1/3 
false  alarm. 

With  reference  to  Figure  3,  let  B  denote  the  rectilincariy  ori¬ 
ented  box  hat  enclosing  a  target  as  determined  by  hand  segmen¬ 
tation.  Let  B  denote  its  estimate  as  output  by  a  computer  pro¬ 
gram.  Segmentation  accuracy  is  estimated  by: 


or  the  common  area  divided  by  the  total  area  of  the  figures, 
averaged  over  detected  targets.  Note:  0  sA  <=  1 

TESTS  PERFORMED  BY  COMPUTER  SIMULATION 

One  goal  of  this  project  is  to  determine  the  suitability  of  var¬ 
ious  hardware  implementations  of  the  candidate  algorithms. 
This  entails  an  evaluation  of  both  performance  and  implemen- 
tability.  However,  the  tests  discussed  here  took  place  only  in 
software  and  did  not  take  advantage  of  the  additional  informa¬ 
tion  which  would  be  available  to  a  real-time  target  acquisition 
system,  such  as  range  to  center  of  field  of  view.  This  point  will 
be  expanded  upon  below, 

A  target  acquisition  system  typically  consists  of  a  number  of 
pipelined  stages.  The  first  stage  accepts  images  from  the  FLIR 
and  docs  some  preprocessing.  This  Preprocessor  usually  takes 
the  form  of  one  or  more  local  filters  which  reduce  noise,  extract 
local  features,  and/or  increase  the  contrast  between  targets  and 
background, 


Note:  Image  polarity  was  allowed  as  an  input. 


The  second  stage  is  the  Detector,  which  locates  blobs  it  sus¬ 
pects  of  being  targets.  The  centroids  of  these  candidate  targets 
are  then  passed  to  a  Segmetuor  and/or  Feature  Extractor.  This 
third  stage  either  determines  the  borders  of  the  detected  blobs 
or  extracts  features  to  describe  them.  This  information  is  passed 
to  the  fourth  stage,  called  the  Classifier.  This  final  stage  classi¬ 
fies  potential  targets,  prioritizes  them,  determines  confidence 
probabilities,  and  outputs  this  data  along  with  target  locations, 

A  real-time  system  mounted  on  a  moving  vehicle  should  take 
advantage  of  the  following  types  of  additional  information 
which  are  usually  available. 

(1)  Range  -  The  range  to  the  center  of  the  field  of  view  will 
usually  be  known.  This  will  permit  calculation  of  expected  tar¬ 
get  size  in  pixels  at  that  location.  Size  in  pixels  at  other  locations 
in  the  field  of  view  can  be  estimated  from  assumptions  or 
knowledge  of  terrain  geometry. 

(2)  History  -  Past  values  of  any  computed  parameters  can  be 
stored.  (For  example,  the  location  of  a  detected  target  on  past 
frames  can  be  used  to  predict  current  location.) 

(2)  Control  Loops  -  Feedback  loops  can  be  set  up  between 
any  stages.  (For  example,  the  output  of  the  Classifier  can  bo 
used  to  control  thresholds  in  the  Detector.) 

If  a  real-time  system  fits  the  model  described,  then  separate 
ciicuit  boards  might  be  used  for  each  stage.  Boards  implement¬ 
ing  different  algorithms  could  be  interchanged,  and  the  algo¬ 
rithms  tested  on  large  quantities  of  data.  If  board  pin  outs  and 
protocols  are  standardized,  then  even  boards  developed  by  dif¬ 
ferent  companies  could  be  interchanged  and  tested  in  a  compet¬ 
itive  manner.  No  such  standardization  now  exists.  Further¬ 
more,  in  current  real  systems,  the  separation  in  functions 
between  the  various  stages  is  often  not  clearly  defined.  An  algo¬ 
rithm  implemented  on  a  board  may  combine  parts  of  several 
stages.  Even  the  basic  hardware  organization  may  be  different 
from  that  discussed  above,  with  some  stages  operating  in  paral¬ 
lel  or  in  a  different  order  than  indicated.  There  is  always  a 
tradeoff  in  performance  between  the  various  stages,  since  it  is 
only  the  output  of  the  final  stage  which  really  matters.  For  ex¬ 
ample,  it  is  acceptable  for  a  detector  to  operate  at  a  high  false 
alarm  rate  if  a  classifier  will  later  cull  out  counterfeit  targets. 
This  is  an  important  qualification  for  a  study  such  as  this, 
where  individual  algorithms  are  being  tested,  not  entire  sys¬ 
tems.  We  will  deal  in  part  with  this  concern  by  plotting  detec¬ 
tion  rate  vs.  false  alarm  rate,  though  our  results  are  not  as  com¬ 
prehensive  as  may  be  desired. 

CLUSTERING  IN  MEASURE  SPACE 

The  standard  method  of  segmenting  an  image  is  by  gray  level 
thresholding.  Here  the  classes  correspond  to  gray  level  ranges, 
e.g.,  “light  =  hot"  and  “dark  =cooi”.  Since  these  ranges  are  not 
known  in  advance,  ‘hey  must  be  determined  by  examining  the 
gray  level  histogram  anil  looking  for  peaks  (one  dimensional 
clusters),  and  choosing  thresholds  (one  dimensional  decision 
surfaces)  that  separate  the  peaks. 

A  number  of  investigators  have  suggested  that  multidimen¬ 
sional  feature  space  should  also  be  useful  for  segmenting  com¬ 
plex  gray  scale  images.  A  variety  of  features  may  be  defined 
over  a  neighborhood  set  about  a  pixel,  e.g.,  mean,  median,  var¬ 
iance,  commonality,  total  variation.  This  approach  could  be 
employed  when  a  single  feature,  such  as  gray  level,  is  not  ade¬ 
quate  for  segmentation  because  the  given  image  contains  a 
number  of  textured  regions  whose  gray  level  ranges  overlap. 

Initial  work  at  Westinghouse  has  indicated  that  thresholding 
by  cluster  detection  in  histograms  is  not  adequate  for  separating 
targets  from  background.  Gray  level  target  and  background 
clusters  are  often  not  separable,  i.e.,  their  probability  densities 


overlap,  l  ikewise,  the  response  of  local  operators  tends  to  be 
rather  variable,  not  yielding  well  defined  clusters.  The  basic 
weakness  of  segmentation  schemes  which  use  only  local  feature 
values  is  that  they  attempt  to  classify  image  parts  without  re- 
gurd  to  their  relative  positions  in  the  image.  It  should  not  sur¬ 
prise  us  that  any  approach  which  does  not  take  spatial  contigu¬ 
ity  fully  into  account  fails  much  of  the  time. 

ALGORITHMS  SELECTED 

The  segmentation  algorithms  investigated  in  our  study  are 
those  that  make  use  not  only  of  similarity  but  also  proximity. 
The  candidate  algorithms  have  been  classified  into  six  groups: 

(1)  Double  Window  Filters 

(2)  Spoke  Filters 

(!))  Border  Followers 

(4)  Relaxation  Algorithms 

(5)  Pyramid  Approaches 

(6)  Mode  Seekers 

Each  of  these  approaches  will  be  described  below.  At  least 
one  method  of  each  type  will  be  tested  on  the  assembled  data 
base. 

DOUBLE  WINDOW  FILTERS 

A  double  window  filter  slides  two  non-overlapping  windows 
ever  an  image.  The  windows  are  both  commonly  rectangles, 
•*'iifi  one  surrounding  the  other  (Figure  4).  The  intensities 
within  the  inner  window  are  viewed  as  samples  of  a  random 
variable  X  and  those  of  the  outer  window  as  samples  of  a  ran¬ 
dom  variable  Y.  The  objective  is  to  determine  if  the  inner  win¬ 
dow  surrounds  a  target,  while  the  outer  window  contains  back¬ 
ground  cluttei  (Figure  5).  Since  little  a  priori  information  is 
available  about  target  and  background  statistics,  some  assump¬ 
tions  are  usually  made  before  the  problem  is  formulated  and 
solved.  One  way  to  pose  the  problem  is  in  the  language  of  statis¬ 
tical  hypothesis.  Doing  so  usually  involves  choosing  distribu¬ 
tions  to  model  the  behaviors  of  X  and  Y.  Results,  of  course,  are 
valid  only  if  the  chosen  distributions  correctly  describe  the  ex¬ 
perimental  situation,  namely  that  they  provide  the  correct  sta¬ 
tistical  model.  Although  simple  statistical  hypotheses  concern 
only  the  parameters  of  assumed  or  known  distributions,  it  is 
also  possible  to  define  composite  hypotheses  which  also  con¬ 
cern  the  fundamental  forms  of  the  distributions  themselves.  In 
either  case,  to  construct  a  criterion  for  testing  a  given  statistical 
hypothesis  requires  the  formulation  of  an  alternate  hypothesis. 
Symbolically,  we  will  let  H0  stand  for  the  null  hypothesis  and 
Ha  stand  for  the  alternate  hypotnesis, 
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Figure  4.  Window  Geometry 


51 


Figure  5.  Three  cases  of  target/filler  relationship:  (a)  target 
completely  within  inner  window,  (b)  target  partially 
within  inner  window,  (c)  target  outside  inner  win¬ 
dow. 

A  typical  simple  hypothesis  is  as  follows: 

X,,...,  Xm  is  an  independent  random  sample  of  a  normal  ran¬ 
dom  variable  X  with  mean  #ix  and  unknown  vari¬ 
ance  a2x 

Y| . Yn  is  an  independent  random  sample  of  a  normal  ran¬ 

dom  variable  Y  with  mean  jiYand  unknown  variance 

a\ 

H„:  Mx  =  My 

ha:  Mx*My 


This  test  can  be  performed  in  terms  of  u  quantity  f  which  has 
a  t-distribution  with  m  +n  -  2  degrees  of  freedom  (I8j: 


where 

T  =  k(*  ~  Y> 

•J&X2  +  Oy2 

m  n 

lEv' 

i  =  l  i=l  (3) 


1 1„  is  re  jected  at  n  significance  level  a  if  and  only  if  t  a  e,  where 
t  is  the  experimental  value  of  f.  If,  for  instance,  n  =  10,  m  --  6, 
and  n  =»■  0.05,  then  c  -■  2.145. 

If  targets  are  always  known  to  be  hotter  (i.c.,  of  higher  inten¬ 
sity  level)  than  the  background,  the  test  can  be  reformulated  as: 

M,  =  MV 

ha:  m,  <  My  <4> 

This  approach  and  mote  complex  formulations  are  conven¬ 
tionally  used  by  Wextiughouse  in  detection  of  targets  in  radar.  A 
numbet  of  companies  are  using  it  for  target  detection  in  FL!R 
imagery. 

Texas  Instruments  (Tlj  and  Ford  Aerospace  [FA]  have  devel¬ 
oped  filters  of  this  type,  TI  [2]  uses  a  metric  of  the  form: 

c  =  (X  -  Y)2  +  ox2 

Oy  (5) 


FA  [I]  divides  the  outer  window  into  N  subregions.  They  use  a 
metric  of  the  form: 

N 

C  =  £  (lx  -  .v  (k)), 

k  =  I  (6) 

where  Ix  is  the  mean  of  inner  window  pixels  exceeding  a  speci¬ 
fied  threshold  and  IY  (k)  is  the  mean  of  outer  window  pixels  in 
subregion  k  exceeding  the  threshold.  Both  TI  and  FA  use  range 
for  the  control  of  window  size.  Neither  referenced  paper  pro¬ 
vides  ;t  statistical  model  for  tiie  experimental  design. 

WoJlfson  of  Westinghouse  sees  no  justification  in  assuming 
normality  or  any  other  statistical  distribution.  Fie  suggests  an 
approach  based  on  estimates  of  the  probability  density  func¬ 
tions  of  random  variables  X  and  Y  (as  obtained  from  their  sam¬ 
ple  values).  A  computer  program  was  written  to  test  this  con¬ 
cept  .  The  upper  three  bits  of  sample  intensity  levels  were  used  to 
estimate  probability  density  functions  fx  and  fY.  Since  three 
bits  were  used,  these  estimates  in  effect  were  quantized  to  eight 

entries  each;  i.e.,  fx  =  (fx(l) . fx(8)),  fY=  (?Y(1) . ?y(8)). 

A  target  is  detected  if  and  only  if  the  maximum  intensity  level  in 
the  inner  window  exceeds  that  of  the  outer  window  end 

8 

E  |fx(i)  -  ?v(i)l 
i  =  I 

exceeds  a  given  threshold.  Since  range  was  not  available  for 
these  experiments,  five  filters  were  used  in  order  of  decreasing 
size,  with  a  detection  occurring  upon  first  exceeding  the  thresh¬ 
old.  Since  several  detections  often  occur  over  nearby  pixels,  de¬ 
tections  were  spatially  clustered  and  replaced  by  their  cluster 
centroid.  The  associated  window  widths  and  heights  were  aver¬ 
aged  to  form  an  estimate  of  target  extent.  Results  arc  shown  in 
Figures  6  and  7.  The  method  works  fairly  well.  It  has  difficulty 
with  targets  near  image  borders,  but  this  is  not  a  serious  prob¬ 
lem  in  an  actual  target  acquisition  system  where  the  input  imag¬ 
ery  is  typically  375  x  875  pixels  in  size  and  targets  r, over  a  rela¬ 
tively  small  area.  Since  this  method  uses  raw  gray  levels,  rather 
than  edges,  it  is  bound  to  produce  a  fairly  stable  output  from 
frame  to  frame. 
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Figure  6.  Detection  Kate  vs  False  Alarm  Rate 

spoke:  filter 

Minor  and  Sklansky’s  [4|  spoke  filter  is  a  generalization  of 
the  Hough  circle  detector  (5).  The  spoke  filter  uses  an  8-spoke 
digital  mask,  which  can  be  designed  to  detect  either  light  or 
dark  blobs,  as  illustrated  in  Figure  8.  The  arrows  in  the  figure 
define  a  template.  Each  detected  edge  falling  within  the  filter 
area  is  examined  to  determine  if  its  strength  exceeds  a  threshold 
and  matches  the  direction  of  the  corresponding  template  ele¬ 
ment.  An  8-bit  image  size  detection  buffer  is  used  to  store  edge 
matches.  Each  bit  corresponds  to  one  spoke  direction.  The  n-th 
bit  is  set  at  buffer  location  (x,y)  if  at  least  one  match  is  obtained 
along  the  n-th  spoke  when  the  filter  is  centered  at  (x.y).  Upon 
completion  of  the  spoke  filtering  operation,  a  3  x  3  OR  filter  is 
convolved  with  the  detection  buffer.  A  detection  is  then  consid¬ 
ered  obtained  at  position  (x,y)  if  at  least  N  bits  of  the  (x.y) 
location  of  the  buffer  are  set  (typically  N  =  4). 

Minor  and  Sklansky  perform  segmentation  after  detection  is 
completed.  The  segmentor  used  is  a  modified  version  of  the 
University  of  Maryland’s  Superslice  [3,20,21]  algorithm.  Su¬ 
perslice  views  objects  as  being  distinct  from  their  surroundings 
by  the  presence  of  edges  at  their  boundary.  Superslice  starts  by 
choosing  a  threshold  to  segment  the  image  into  a  set  of  con¬ 
nected  components.  It  then  extracts  an  edge  map  from  the  im¬ 
age.  Finally,  it  measures  the  percent  of  each  component’s  border 
which  coincides  with  the  edge  map.  An  extracted  blob  is  one 
which  produces  high  edge-border  coincidence.  A  number  of 
thresholds  are  usually  tried  before  good  edge-border  coinci¬ 
dence  is  obtained.  Different  thresholds  may  be  required  for  ex¬ 
tracting  different  objects  in  the  same  scene.  Minor  and 
Sklansky’s  version  of  Superslice  also  uses  edge  direction  in  the 
measure  of  coincidence. 


Ill  I »  Cl  (ON 
NAT  I  Ul 


MtMIIH  Of  OlMCHORS 


NUMB!  N  Of  TABCm 


0  3  0  «  0  4  0  6  Of  UD  09 

SEGMENT  At  IQ*  ACCUNACT 


Figure  7.  Detection  Rale  vs  Segmentation  Accuracy 


r_ 

lF 

rr 

Z 

- 

i 

\ 

& 

- 

i 

\ 

z 

- 

K 

z 

\ 

IT 

t 

i 

t 

i 

i 

1 

1 

\ 

z 

- 

✓ 

\ 

- 

z 

— 

z 

- 

Figure  8a.  Spoke  Filter  for 
Bright  Objects  on 
a  Dark  Back¬ 
ground  (From  an 
Unpublished  Ver¬ 
sion  of  |4|  ) 


82  2105  V  Fi 


Figure  8b.  Spoke  Filter  for 
Dark  Objects  on 
a  Bright  Back¬ 
ground 


■ 

■ 

■ 

a 

1 

z 

■ 

■ 

■ 

V 

_l 

■ 

a 

B 

■ 

V 

9 

■ 

a 

■ 

v. 

■ 

■ 

■ 

■ 

B 

■ 

Q 

a 

D 

9 

i 

■ 

■ 

i 

t 

t 

1 

■ 

■ 

■ 

a 

9 

Q 

■ 

■ 

a 

E 

z 

V 

□ 

B 

B 

z 

- 

z 

82  2105  V  6b 


53 


A  group  at  Hughes  [6]  briefly  describes  a  method  which  tests 
each  pixel  in  a  region  of  interest  for  boundedness  by  edges  in 
eight  directions.  A  pixel  is  labeled  to  be  interior  to  some  bound¬ 
ary  if  it  is  bounded  by  edges  in  six  of  the  eight  directions.  The 
labeled  image  is  then  thrcsholded  to  extract  a  target  interior.  If 
V  s  \  then  pixels  with  intensity  greater  than  threshold  C  are 
selected,  otherwise  pixels  with  intensity  less  than  C  are  chosen. 
Finally,  a  thinning  and  filling  operation  is  used  to  remove  small 
holes  and  smooth  the  segmented  region.  The  Hughes  report 
supplies  no  statistical  model  from  which  their  thresholding 
equation  is  derived. 

The  author  developed  another  algorithm  based  upon  the  ba¬ 
sic  spoke  filtering  philosophy.  The  algorithm  uses  sets  of  3x3 
pixel  templates  -  with  each  template  designed  to  match  a  section 
of  a  particular  object’s  silhouette.  The  operation  of  the  algo¬ 
rithm  will  be  described  by  example. 

First,  suppose  that  an  algorithm  is  to  be  designed  to  detect 
octagons  of  known  polarity.  A  4-bit  image  size  buffer  (initially 
zeroed)  will  be  used  to  record  detections.  The  algorithm  will 
start  by  scanning  across  each  of  the  rows  of  an  image,  applying 
a  3x3  vertical  edge  detector  at  each  point  visited,  When  a  left 
blob  edge  is  detected,  its  position  will  be  noted  (Figure  9a). 
The  scan  will  continue  until  a  right  blob  edge  is  detected  (Figure 
9b).  At  this  time,  a  0001 2  will  be  OR’d  with  the  contents  of  the 
buffer  at  a  location  corresponding  to  the  midpoint  between  de¬ 
tected  edges  (figure  9c).  This  process  is  repeated  upon  the  image 
columns  and  in  the  two  diagonal  directions  using  the  markers 
shown  in  Figure  10a. 

At  the  completion  of  the  scans  in  the  four  directions,  we 
would  expect  to  find  a  1 1 1 1 2  at  buffer  locations  corresponding 
to  centers  of  octagons  in  the  original  image.  Since  this  may  not 
always  occur,  due  to  the  discrete  nature  of  the  image  geometry, 
a  3  x  3  OR  filter  is  convolved  with  the  detection  buffer. 

Now  suppose  that  we  want  to  detect  military  vehicles.  Instead 
of  applying  edge  detectors  in  the  four  scan  directions,  feature 
detectors  will  be  applied  which  correspond  to  shapes  located 
around  the  sought  vehicles' silhouettes.  Also,  the  diagonal  scan 
directions  should  not  be  exactly  45  degrees  to  image  rows,  but 
rather  should  be  related  to  the  average  height  to  width  ratio  of 
targets  (Figure  10b). 
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Figure  10a.  Templates  and  Scanning  Directions  for 
Octagon  Detector 


Figure  10b.  Templates  and  Scanning  Directions  for  Military 
Vehicle  Detector  (A  Succession  of  &  Features  Rep¬ 
resents  a  Particular  Template) 

Upon  completion  of  the  four  scans,  a  spatial  clustering  algo¬ 
rithm  is  applied  to  detections,  with  cluster  points  replaced  by 
their  cluster  centroid. 

The  algorithm  performs  very  well  as  a  detector  as  seen  in 
Figure  6.  Detection  rate  was  67  percent  with  0.69  false  alarms 
per  target  The  algorithm  was  also  evaluated  as  a  seginentor 
(Figure  7)  with  the  distances  between  a  centroid  and  the  hori¬ 
zontal  and  vertical  feature  detections  used  to  estimate  target 
radii.  Segmentation  accuracy  was  not  very  good,  with  targets 
qsually  judged  smaller  than  they  actually  were. 

BORDF.R  FOLLOWING/EDGE  BASED  SEGMENTATION 

A  blob  outline  containing  no  breaks  is  a  closed  curve  separat¬ 
ing  the  blob  from  its  background.  This  curve  may  pass  through 
the  same  pixel  twice  if  the  blob  has  a  narrow  neck  near  or  at  this 
pixel.  A  number  of  border  following  algorithms  are  described 
in  the  image  processing  literature,  e  g.,  see  [7-10].  They  start  by 
locating  a  prominent  border  point,  and  continue  by  visiting  ad¬ 
jacent  border  points  (edge  elements)  in  sequence,  eventually  re¬ 
turning  to  the  starting  point.  Difficulties  arise  when  an  object’s 
boundary  is  not  distinct  over  its  entire  length. 

Schenker  and  Cooper  [7,9)  describe  an  elaborate  algorithm 
which  seeks  the  most  probable  blob  boundary  by  conducting  an 
exhaustive  search  over  all  boundaries.  The  “true  boundary"  is 
then  viewed  as  the  one  which  maximizes  the  joint  likelihood  of 
the  observed  data  and  a  hypothesized  boundary.  Since  an  ex¬ 
haustive  search  proves  to  be  computationally  prohibitive,  sub- 
optimal  search  algorithms  are  also  proposed. 

Perkins  [15],  while  with  General  Motors  Research  Lab,  de- 


54 


veloped  an  object  detection/segmentation  algorithm  for  use  in 
a  robot  vision  system.  The  algorithm  is  implemented  by  a  se¬ 
quence  of  simple  operations.  First,  an  edge  map  is  obtained. 
Next,  uniform  intensity  regions  are  extracted  by  expanding  ac¬ 
tive  edge  regions,  labeling  the  segmented  uniform  intensity  re¬ 
gions,  and  then  contracting  the  edge  regions.  The  region  with 
the  most  points  along  the  image  border  is  assumed  to  be  the 
background.  Finally,  foreground  region  boundary  points  are 
visited,  the  locations  of  points  which  are  roughly  equidistant 
along  a  bounda.y  are  stored  in  an  array.  The  performance  of 
Perkins'  algorithm  as  a  FLIR  target  detector  is  not  particularly 
good  (Figure  6).  The  detection  rate  over  the  test  data  base  was 
only  50  percent.  The  algorithm  as  implemented  at  GM  does  not 
use  image  polarity  and,  as  a  consequence,  sometimes  misses 
nearby  targets,  while  indicating  a  detection  half-way  between 
them.  The  performance  of  the  algorithm  as  a  segmentor  is  ex¬ 
cellent  (Figure  6). 

A  simple  border  follower  was  programmed  and  tested  at 
Westinghouse.  A  detection  rate  of  61  percent  was  obtained  at  a 
very  high  false  alarm  rate  (Figure  7). 

RELAXATION  ALGORITHMS 

Rosenfeld  et  al.  [12,  19,  23),  Kirby  [131,  Peieg  [14),  and 
others  have  viewed  image  segmentation  as  a  graph  labeling 
problem.  The  proposed  approximate,  iterative,  parallel  solu¬ 
tions  are  called  “relaxation  methods”  which  work  as  follows. 
First  an  initial  set  of  labels  is  assigned  to  each  image  node  based 
upon  local  image  properties.  The  labels  at  each  node  are  given 
weights  between  0  and  I .  This  initial  set  of  assignments  may  be 
ambiguous  or  incorrect  in  spots  because  of  the  imperfect  nature 
of  the  local  image  measures  or  noise  in  the  image.  An  iterative 
process  using  information  obtained  from  neighbors  is  then  used 
to  improve  the  initial  decisions.  The  weights  at  each  node  are 
simultaneously  updated  at  each  iteration  (typically)  based  upon 
the  weights  at  node  neighbors  during  the  previous  iteration. 
This  has  the  effect  of  altering  the  probabilities  initially  assigned 
to  noise  points  to  make  them  more  consistent  with  their  sur¬ 
round.  The  process  is  stopped  when  the  labeling  of  all  nodes 
seems  to  reach  a  steady  state. 

To  apply  2-labe!  relaxation  to  segmentation  [23J,  a  set  of 
"light"  and  “dark"  probabilities  is  assigned  to  image  nodes 
based  upon  their  gray  levels.  Probabilities  at  each  node  are  then 
iteratively  adjusted  based  upon  probabilities  at  neighboring 
nodes,  i.e.,  light  reinforces  light  and  dark  reinforces  dark. 
Eventually,  the  blob  pixels  should  become  uniformly  light,  and 
the  background  uniformly  dark,  so  that  segmentation  is  easy. 
The  blob  region  is  extracted  by  locating  connected  components. 
A  3-Iabel  scheme  uses  labels  of  “object",  "background",  and 
“clutter”. 

The  relaxation  algorithms  tested  performed  rather  poorly  as 
target  detectors  (Figure  6). 

PYRAMID  APPROACHES 

Let  the  size  of  an  image  be  2"  x  2"  pixels.  Reduced  resolution 
versions  can  be  linked  with  a  pyramid  data  structure.  The  top 
(level  0)  of  the  pyramid  will  be  a  size  1  x  1  image.  The  original 
2"  x  2n  image  will  be  at  the  bottom  of  the  pyramid.  At  level  K 
wUI  be  an  image  of  size  2k  x  2k.  A  number  of  different  schemes 
have  been  developed  for  detecting  blobs  with  pyramids  [24). 

The  pyramid  linking  scheme  of  Burt,  Hong,  and  Rosenfeld 
[28]  works  as  follows.  A  father-son  relationship  is  defined  be¬ 
tween  nodes  of  adjacent  levels  of  the  pyramid.  Each  node  at 
level  K  is  the  father  of  a  2  x  2  array  of  “candidate  son”  nodes 
at  level  K  +  1 .  Likewise,  each  node  at  level  K  is  the  son  of  four 
"candidate  father"  nodes  at  level  K-l.  At  each  iteration  of  the 


pyramid  linking  algorithm,  a  node  is  linked  to  the  “most  simi¬ 
lar"  of  its  four  candidate  fathers.  A  node  is  then  assigned  the 
average  gray  level  of  those  sons  linked  to  it.  The  process  con¬ 
tinues  in  this  manner  until  a  steady  state  is  reached.  The  links 
then  define  trees,  and  the  leaves  of  "light"  trees  are  target  seg¬ 
ments.  The  particular  pyramid  linking  algorithm  tested  did  not 
perform  well  as  a  detector,  yielding  loo  many  false  alarms* 
(Figure  6). 

MODE  SEEKERS 

There  is  a  class  of  techniques  in  which  a  pixel  is  iteratively 
replaced  by  the  average  of  a  selected  set  of  its  neighbors.  Naray¬ 
anan  and  Rosenfeld  [29]  describe  a  method  that  chooses  those 
neighbors  for  averaging  which  belong  to  the  same  histogram 
peak  as  the  given  pixel.  The  simplest  version  of  this  method 
chooses  those  neighbors  higher  up  on  the  image  histogram 
peak.  This  tends  to  move  pixel  gray  levels  toward  their  subpop¬ 
ulation  modes.  An  improved  version  chooses  a  neighbor  only  if 
there  is  no  significant  concavity  in  the  histogram  between  it  and 
the  given  pixel.  Ideally,  upon  termination,  there  will  be  a  two- 
spiked  histogram,  with  the  lighter  spike  formed  from  target 
pixels.  Target  regions  are  then  obtained  by  extracting  connected 
components. 

A  local  version  of  the  algorithm  performs  processing  over 
relatively  small  image  windows  in  sequence,  while  the  global 
version  (named  Global  Superspike)  uses  the  histogram  for  the 
entire  image.  The  global  version  was  tested  since  the  images  in 
the  data  base  are  only  128  x  120  pixels  in  size.  Its  detection  rate 
was  88  percent  (Figure  6). 

EVALUATION 

The  mode  seeker  had  the  highest  detection  rate  of  any  of  the 
methods  tested.  Its  segmentation  accuracy  fell  in  the  65  percent 
to  75  percent  range,  as  did  most  of  the  other  methods. 

Spoke  filters  perform  very  well  as  detectors.  But  they  have  a 
low  segmentation  accuracy,  and  may  need  to  be  followed  by  a 
separate  segmentor  such  as  Superslicc. 

Double  window  filters  work  well  as  detectors.  They  can  also 
be  used  for  segmentation  if  only  target  extents  are  required.  But 
preferably,  they  should  be  designed  to  seek  out  targets  of  partic¬ 
ular  sizes  if  range  is  available  as  an  input.  If  targets  of  several 
sizes  and  orientations  arc  sought,  then  a  different  filter  is  re¬ 
quired  for  each.  Each  filter  must  vary  in  size  down  the  image  to 
correct  for  perspective.  This  has  a  multiplicative  effect  on  hard¬ 
ware  size  and  cost. 

Simple  border  followers  must  be  run  at  a  high  false  alarm 
rate  to  yield  a  reasonable  detection  rate.  They  are  well  suited  for 
segmentation,  yielding  estimates  of  target  borders.  If  an  entire 
image  is  available  for  processing,  the  border  following  can  start 
at  the  most  prominent  border  point  and  proceed  from  there,  if 
an  image  is  to  be  processed  one  line  at  a  time  (on-the-fiy),  then 
the  topmost  target  point  must  be  detected  to  initialize  the  fol¬ 
lower.  This  second  approach  may  have  difficulty  with  non-con- 
vex  targets  but  is  easier  to  build  into  hardware. 

Border  followers  and  double  window  filters  can  be  readily 
implemented  in  haidware.  However,  as  noted  above,  proper  im¬ 
plementation  of  the  double  window  filter  requires  considerable 
computation  power.  The  spoke  filter  in  its  purest  form,  as  pro¬ 
posed  by  Minor  and  Skiansky,  is  not  well  suited  for  hardware 


*  Note:  Another  pyramid  scheme  was  recently  tested  at  the 
University  cf  Maryland  on  this  data  with  much  better 
results. 
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implementation.  It  is  computationally  expensive  requiring  the 
repeated  access  of  a  large  array  of  edges  (or  pixels)  as  the  filter 
sweeps  over  the  image.  The  algorithm  can,  however,  be  reform¬ 
ulated  in  a  number  of  different  ways  to  meet  the  requirements 
of  particular  computer  architectures. 

Relaxation  algorithms  are  best  implemented  in  a  cellular  ar¬ 
ray  architecture  having  one  processor  per  pixel.  No  one  has  ever 
built  a  target  acquisition  system  using  this  architecture;  to  do  so 
must  therefore  be  viewed  as  a  risky  venture.  The  same  holds 
true  for  pyramid  approaches. 

There  are  two  difficulties  with  implementing  the  mode  seeker. 
A  FLIR  image  is  typically  875  x  875.  A  mode  seeker  should  be 
implemented  on  smaller  image  subregions,  possibly  horizontal 
strips  (range  zones).  Secondly,  the  required  post-processing  step 
of  extracting  connected  components  is  rather  difficult  to  imple¬ 
ment  in  real  time  hardware.  However,  the  image  smoothing  al¬ 
gorithm  (29),  upon  which  the  mode  seeker  is  based  would  be  an 
excellent  preprocessor  for  any  detection  algorithm.  An  analysis 
of  its  hardware  implementability  is  not  yet  complete. 

CONCLUSIONS 

The  following  conclusions  were  drawn  from  the  study. 

•  Mode  seekers,  double  window  filters,  and  spoke  filters  all 
show  promise  as  targets  detectors.  Border  followers  and 
possibly  mode  seekers  (if  hardware  implementation  can  be 
worked  out)  are  fair  segmentors. 

•  No  existing  computer  architecture  appears  appropriate  for 
the  real-time  implementation  (on  one  or  two  circuit 
boards)  of  all  detect ion/segmentatien  algorithms.  A  spe¬ 
cial  architecture  is  required  for  the  efficient  implementa¬ 
tion  of  each  particular  algorithm. 

It  was  originally  intended  to  test  an  artificial  intelligence  (AI) 
approach  to  improving  algorithm  performance.  This  appears  to 
be  much  more  difficult  than  initially  thought.  The  investigation 
has  not  revealed  any  real-time  target  acquisition  system  which 
uses  AI  concepts.  It  is  concluded  that  for  the  present  efforts  are 
better  directed  to  the  use  of  available  information,  such  as 
range  and  feedback  data.  At  a  later  time,  an  attempt  can  be 
made  to  incorporate  a  simple  knowledge  base  and  a  limited  rea¬ 
soning  ability.  Several  possible  examples  might  be: 

•  Ground  vehicles  are  likely  to  be  located  below  the  horizon. 

•  Targets  of  the  same  type  often  appear  in  groups,  sometimes 
in  moving  columns  on  or  near  roads. 

•  Gathered  intelligence  indicating  presence  of  different  tar¬ 
get  types  is  often  available. 

A  STRATEGY 

A  sound  strategy  is  to  precede  detection  by  preprocessing 
filters  designed  to  supress  noise  and  "bring  out”  targets.  The 
Narayan  filter  may  be  of  use  here.  This  should  be  followed  by 
an  initial  detection  algorithm  which  is  cheap  to  implement  in 
hardware,  but  which  is  not  necessarily  a  "star”  performer.  This 
algorithm  should  be  run  at  a  low  threshold  (i.e.,  high  false 
alarm  rate)  to  make  sure  that  most  targets  are  detected.  This 
should  be  followed  by  a  more  expensive  detection/segmenta- 
tion  algorithm  which  will  operate  only  on  the  initial  detections. 
Performance  should  be  improved  by  target  tracking,  frame-to- 
frame  integration  of  extracted  data,  and  decision  smoothing. 
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